Skip to content

Documentation for NemoClaw#129

Closed
kirit93 wants to merge 57 commits intomainfrom
kirit93/documentation
Closed

Documentation for NemoClaw#129
kirit93 wants to merge 57 commits intomainfrom
kirit93/documentation

Conversation

@kirit93
Copy link
Collaborator

@kirit93 kirit93 commented Mar 5, 2026

Added documentation for NemoClaw.

@github-actions
Copy link

github-actions bot commented Mar 5, 2026

All contributors have signed the DCO ✍️ ✅
Posted by the DCO Assistant Lite bot.

@kirit93
Copy link
Collaborator Author

kirit93 commented Mar 5, 2026

I have read the DCO document and I hereby sign the DCO.

johntmyers and others added 15 commits March 5, 2026 13:31
The create-spike skill was splitting findings between the issue body and
a follow-up comment. This made spike results harder to read and review.
Merge the technical investigation section into the issue body template
and remove the comment-posting step entirely.

Co-authored-by: John Myers <johntmyers@users.noreply.github.com>
Closes #32

Add post-condition checks in drop_privileges() to verify that setgid()
and setuid() actually changed the effective IDs. Also verify that
setuid(0) fails after dropping privileges, confirming root cannot be
re-acquired. This is a defense-in-depth hardening measure per CWE-250
and CERT POS37-C. All added syscalls (geteuid, getegid, setuid) are
async-signal-safe, so they are safe in the pre_exec context.

Co-authored-by: John Myers <johntmyers@users.noreply.github.com>
The entrypoint script uses 'openssl rand -hex 32' to generate the SSH
handshake secret (added in c6919aa), but the rancher/k3s base image
does not include openssl. This causes the container to exit with code
127 immediately on startup.

Fixes #136
Closes #130

Rename the TUI from "Gator" to use generic naming:
- CLI subcommand: `nemoclaw gator` → `nemoclaw term`
- TUI title bar: "Gator" → "NemoClaw"
- Docs/skills: "Gator" → "the TUI" or "NemoClaw TUI"
- File renames: gator.toml → term.toml, gator.md → tui.md
- Remove dead link to nonexistent plans/gator-tui.md

Co-authored-by: John Myers <johntmyers@users.noreply.github.com>
…135)

* feat(policy): add validation layer to reject unsafe sandbox policies

Add policy validation that checks for root process identity, path
traversal sequences, overly broad filesystem paths, and exceeding
filesystem rule limits. Validation runs at three entry points:
disk-loaded YAML policies (fallback to restrictive default on violation),
gRPC CreateSandbox, and gRPC UpdateSandboxPolicy (returns
INVALID_ARGUMENT). Filesystem paths are normalized before storage to
collapse traversal components.

Closes #33

* fix(e2e): correct policy update test to match immutable field behavior

The update policy test was asserting on validation errors for fields
(process, filesystem) that are immutable on live sandboxes. The server
rejects changes to these fields before validation runs. Updated the test
to verify the immutability guard instead.

---------

Co-authored-by: John Myers <johntmyers@users.noreply.github.com>
Closes #27

Add remove() methods to TracingLogBus, PlatformEventBus, and
SandboxWatchBus to clean up entries when sandboxes are deleted.
Wire cleanup into both handle_deleted (K8s reconciler) and
delete_sandbox (gRPC handler). Reorder watch_sandbox to validate
sandbox existence before subscribing to buses, preventing entries
for non-existent IDs. Add one-time sandbox validation at stream
open in push_sandbox_logs.

Co-authored-by: John Myers <johntmyers@users.noreply.github.com>
…140)

Closes #26

All list RPCs (ListSandboxes, ListProviders, ListSandboxPolicies,
ListInferenceRoutes) passed the client-provided limit directly to SQL
queries with no upper bound. A client could send limit=u32::MAX and
cause the server to load all records into memory, risking OOM. This
introduces a MAX_PAGE_SIZE constant (1000) and a clamp_limit helper
that caps the limit in every list handler before it reaches the
persistence layer.

Co-authored-by: John Myers <johntmyers@users.noreply.github.com>
…tion (#145)

* fix(server): add field-level size limits to sandbox and provider creation

Closes #24

Add validate_sandbox_spec and provider field validation with named
constants. Configure explicit 1MB tonic max_decoding_message_size.
Inference routes excluded per #133 rearchitecture.

* chore: remove issue number references from code comments

---------

Co-authored-by: John Myers <johntmyers@users.noreply.github.com>
…move implicit catch-all (#146)

Remove multi-route CRUD system and replace with single managed cluster
route (inference.local). Key changes:

- Remove inference route CRUD RPCs and CLI commands
- Remove InspectForInference OPA action; policy is binary allow/deny
- Introduce AuthHeader enum and InferenceProviderProfile in navigator-core
- Router is now provider-agnostic: auth style carried on ResolvedRoute
- Replace InferenceRouteSpec with ClusterInferenceConfig (2 fields vs 8)
- Rename proto: routing_hint->name, SandboxResolvedRoute->ResolvedRoute,
  GetSandboxInferenceBundle->GetInferenceBundle, drop sandbox_id param
- Rename RouteConfig.route -> RouteConfig.name; use inference.local
- Add 'nemoclaw cluster inference update' for partial config changes
- Delete stale navigator.inference.v1.rs checked-in proto file
- Update architecture docs, agent skills, and CLI reference

Closes #133
@pimlock
Copy link
Collaborator

pimlock commented Mar 6, 2026

FYI @kirit93, I just merged the local inference PR: NVIDIA/NemoClaw#146 and this will impact docs around it (pretty much a full rewrite).

I can take a pass and push to this branch, would that work?

@miyoungc
Copy link
Collaborator

miyoungc commented Mar 6, 2026

FYI @kirit93, I just merged the local inference PR: #146 and this will impact docs around it (pretty much a full rewrite).

I can take a pass and push to this branch, would that work?

please use PR 124 @pimlock

drew and others added 8 commits March 6, 2026 16:26
Pass the computed cargo version through Docker and cluster packaging paths so deployed clusters report the built artifact version while keeping latest and explicit version tags aligned.
The reusable Docker build workflow runs under sh by default, and the compute-version step only uses a single command substitution. Removing pipefail avoids the illegal option error without changing the explicit tag-fetch setup.
#158)

* feat(proxy): support plain HTTP forward proxy for private IP endpoints

Add forward proxy mode to the sandbox proxy so that standard HTTP
libraries (httpx, requests, etc.) work with HTTP_PROXY for plain HTTP
calls to private IP endpoints. Previously, non-CONNECT methods were
unconditionally rejected with 403.

The forward proxy path requires all three conditions to be met:
- OPA policy explicitly allows the destination
- The matched endpoint has allowed_ips configured
- All resolved IPs are RFC 1918 private

This ensures plain HTTP never reaches the public internet while enabling
seamless access to internal services without custom CONNECT tunnel code.

Implementation:
- parse_proxy_uri(): parses absolute-form URIs into components
- rewrite_forward_request(): rewrites to origin-form, strips hop-by-hop
  headers, adds Via and Connection: close
- handle_forward_proxy(): full handler with OPA eval, SSRF checks,
  private-IP gate, upstream connect, and bidirectional relay
- Updated dispatch in handle_tcp_connection to route non-CONNECT methods

Includes 14 unit tests and 6 E2E tests (FWD-1 through FWD-6).
CONNECT path remains completely untouched.

Closes #155

* fix(proxy): remove InspectForInference match arm removed by #146

The inference routing simplification in #146 reduced NetworkAction to
Allow/Deny, removing InspectForInference. Drop the dead match arm from
handle_forward_proxy.

* fix(sandbox): restore BestEffort as default Landlock compatibility

The Landlock V2 upgrade in #151 changed the default from BestEffort to
HardRequirement. This causes all proxy-mode sandboxes to crash with
Permission denied when the policy omits the landlock field, because the
child process gets locked to only /etc/navigator-tls and /sandbox.

Restore BestEffort as the default so policies without an explicit
landlock field degrade gracefully.

Fixes #161

* fix(sandbox): inject baseline filesystem paths for proxy-mode sandboxes

Proxy-mode sandboxes need baseline filesystem paths (/usr, /lib, /etc,
/app, /var/log read-only; /sandbox, /tmp read-write) for the child
process to function under Landlock. Without these, the child can't exec
binaries, resolve DNS, or load shared libraries.

The supervisor now enriches the policy with these baseline paths at
startup, covering both standalone (file) and gateway (gRPC) modes. For
gateway mode, the enriched policy is synced back so users see the
effective policy via 'nemoclaw sandbox get'.

The gateway validation is relaxed to allow additive filesystem changes
(new paths can be added, existing paths cannot be removed) to support
the supervisor's enrichment sync-back.

Includes 2 E2E tests: BFS-1 (missing filesystem_policy) and BFS-2
(incomplete filesystem_policy).

Fixes #161

* fix(e2e): update assertion for relaxed filesystem validation message

---------

Co-authored-by: John Myers <johntmyers@users.noreply.github.com>
@kirit93 kirit93 closed this Mar 9, 2026
@kirit93 kirit93 deleted the kirit93/documentation branch March 9, 2026 19:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants